300 research outputs found
Operators for transforming kernels into quasi-local kernels that improve SVM accuracy
Motivated by the crucial role that locality plays in various learning approaches, we present, in the framework of kernel machines for classification, a novel family of operators on kernels able to integrate local information into any kernel obtaining quasi-local kernels. The quasi-local kernels maintain the possibly global properties of the input kernel and they increase the kernel value as the points get closer in the feature space of the input kernel, mixing the effect of the input kernel with a kernel which is local in the feature space of the input one. If applied on a local kernel the operators introduce an additional level of locality equivalent to use a local kernel with non-stationary kernel width. The operators accept two parameters that regulate the width of the exponential influence of points in the locality-dependent component and the balancing between the feature-space local component and the input kernel. We address the choice of these parameters with a data-dependent strategy. Experiments carried out with SVM applying the operators on traditional kernel functions on a total of 43 datasets with diÂźerent characteristics and application domains, achieve very good results supported by statistical significance
Profiling Instances in Noise Reduction
The dependency on the quality of the training data has led to significant work in noise reduction for instance-based learning algorithms. This paper presents an empirical evaluation of current noise reduction techniques, not just from the perspective of their comparative performance, but from the perspective of investigating the types of instances that they focus on for re- moval. A novel instance profiling technique known as RDCL profiling allows the structure of a training set to be analysed at the instance level cate- gorising each instance based on modelling their local competence properties. This profiling approach oâ”ers the opportunity of investigating the types of instances removed by the noise reduction techniques that are currently in use in instance-based learning. The paper also considers the eâ”ect of removing instances with specific profiles from a dataset and shows that a very simple approach of removing instances that are misclassified by the training set and cause other instances in the dataset to be misclassified is an eâ”ective noise reduction technique
Recommended from our members
Simultaneous Quantification of Multiple Bacteria by the BactoChip Microarray Designed to Target Species-Specific Marker Genes
Bacteria are ubiquitous throughout the environment, the most abundant inhabitants of the healthy human microbiome, and causal pathogens in a variety of diseases. Their identification in disease is often an essential step in rapid diagnosis and targeted intervention, particularly in clinical settings. At present, clinical bacterial detection and discrimination is primarily culture-based, requiring both time and microbiological expertise, especially for bacteria that are not easily cultivated. Higher-throughput molecular methods based on PCR amplification or, recently, microarrays are reaching the clinic as well. However, these methods are currently restricted to a small set of microbes or based on conserved phylogenetic markers such as the 16S rRNA gene, which are difficult to resolve at the species or strain levels. Here, we designed and experimentally validated the BactoChip, an oligonucleotide microarray for bacterial detection and quantification. The chip allows the culture-independent identification of bacterial species, also determining their relative abundances in complex communities as occur in the commensal microbiota or in clinical settings. The microarray successfully distinguished among bacterial species from 21 different genera using 60-mer probes targeting a novel set of in silico identified high-resolution marker genes. The BactoChip additionally proved accurate in determining species-level relative abundances over a 100-fold dynamic range in complex bacterial communities and with a low limit of detection (0.1%). In combination with the continually increasing number of sequenced bacterial genomes, future iterations of the technology could enable to highly accurate clinically-oriented tools for rapid assessment of bacterial community composition and relative abundances
Recommended from our members
PhyloPhlAn is a new method for improved phylogenetic and taxonomic placement of microbes
New microbial genomes are constantly being sequenced, and it is crucial to accurately determine their taxonomic identities and evolutionary relationships. Here we report PhyloPhlAn, a new method to assign microbial phylogeny and putative taxonomy using >400 proteins optimized from among 3,737 genomes. This method measures the sequence diversity of all clades, classifies genomes from deep-branching candidate divisions through closely-related subspecies, and improves consistency between phylogenetic and taxonomic groupings. PhyloPhlAn improved taxonomic accuracy for existing and newly-sequenced genomes, detecting 157 erroneous labels, correcting 46, and placing or refining 130 new genomes. We provide examples of accurate classifications from subspecies (Sulfolobus spp.) to phyla, and of preliminary rooting of deep-branching candidate divisions, including consistent statistical support for Caldiserica (formerly candidate division OP5). PhyloPhlAn will thus be useful for both phylogenetic assessment and taxonomic quality control of newly-sequenced genomes. The final phylogenies, conserved protein sequences, and open-source implementation are available online
Prevotella diversity, niches and interactions with the human host
The genus Prevotella includes more than 50 characterized species that occur in varied natural habitats, although most Prevotella spp. are associated with humans. In the human microbiome, Prevotella spp. are highly abundant in various body sites, where they are key players in the balance between health and disease. Host factors related to diet, lifestyle and geography are fundamental in affecting the diversity and prevalence of Prevotella species and strains in the human microbiome. These factors, along with the ecological relationship of Prevotella with other members of the microbiome, likely determine the extent of the contribution of Prevotella to human metabolism and health. Here we review the diversity, prevalence and potential connection of Prevotella spp. in the human host, highlighting how genomic methods and analysis have improved and should further help in framing their ecological role. We also provide suggestions for future research to improve understanding of the possible functions of Prevotella spp. and the effects of the Western lifestyle and diet on the host-Prevotella symbiotic relationship in the context of maintaining human health
Recommended from our members
Metagenomic microbial community profiling using unique clade-specific marker genes
Metagenomic shotgun sequencing data can identify microbes populating a microbial community and their proportions, but existing taxonomic profiling methods are inefficient for increasingly large datasets. We present an approach that uses clade-specific marker genes to unambiguously assign reads to microbial clades more accurately and >50Ă faster than current approaches. We validated MetaPhlAn on terabases of short reads and provide the largest metagenomic profiling to date of the human gu
Recommended from our members
MetaRef: a pan-genomic database for comparative and community microbial genomics
Microbial genome sequencing is one of the longest-standing areas of biological database development, but high-throughput, low-cost technologies have increased its throughput to an unprecedented number of new genomes per year. Several thousand microbial genomes are now available, necessitating new approaches to organizing information on gene function, phylogeny and microbial taxonomy to facilitate downstream biological interpretation. MetaRef, available at http://metaref.org, is a novel online resource systematically cataloguing a comprehensive pan-genome of all microbial clades with sequenced isolates. It organizes currently available draft and finished bacterial and archaeal genomes into quality-controlled clades, reports all core and pan gene families at multiple levels in the resulting taxonomy, and it annotates familiesâ conservation, phylogeny and consensus functional information. MetaRef also provides a comprehensive non-redundant reference gene catalogue for metagenomic studies, including the abundance and prevalence of all gene families in the >700 shotgun metagenomic samples of the Human Microbiome Project. This constitutes a systematic mapping of clade-specific microbial functions within the healthy human microbiome across multiple body sites and can be used as reference for identifying potential functional biomarkers in disease-associate microbiomes. MetaRef provides all information both as an online browsable resource and as downloadable sequences and tabular data files that can be used for subsequent offline studies
Metagenomic biomarker discovery and explanation
This study describes and validates a new method for metagenomic biomarker discovery by way of class comparison, tests of biological consistency and effect size estimation. This addresses the challenge of finding organisms, genes, or pathways that consistently explain the differences between two or more microbial communities, which is a central problem to the study of metagenomics. We extensively validate our method on several microbiomes and a convenient online interface for the method is provided at http://huttenhower.sph.harvard.edu/lefse/.National Institute of Dental and Craniofacial Research (U.S.) (grant DE017106)National Institutes of Health (U.S.) (NIH grant AI078942)Burroughs Wellcome FundNational Institutes of Health (U.S.) (NIH 1R01HG005969
Computational meta'omics for microbial community studies
Complex microbial communities are an integral part of the Earth's ecosystem and of our bodies in health and disease. In the last two decades, culture-independent approaches have provided new insights into their structure and function, with the exponentially decreasing cost of high-throughput sequencing resulting in broadly available tools for microbial surveys. However, the field remains far from reaching a technological plateau, as both computational techniques and nucleotide sequencing platforms for microbial genomic and transcriptional content continue to improve. Current microbiome analyses are thus starting to adopt multiple and complementary meta'omic approaches, leading to unprecedented opportunities to comprehensively and accurately characterize microbial communities and their interactions with their environments and hosts. This diversity of available assays, analysis methods, and public data is in turn beginning to enable microbiome-based predictive and modeling tools. We thus review here the technological and computational meta'omics approaches that are already available, those that are under active development, their success in biological discovery, and several outstanding challenges
Recommended from our members
Longitudinal survey of microbiome associated with particulate matter in a megacity.
BackgroundWhile the physical and chemical properties of airborne particulate matter (PM) have been extensively studied, their associated microbiome remains largely unexplored. Here, we performed a longitudinal metagenomic survey of 106 samples of airborne PM2.5 and PM10 in Beijing over a period of 6 months in 2012 and 2013, including those from several historically severe smog events.ResultsWe observed that the microbiome composition and functional potential were conserved between PM2.5 and PM10, although considerable temporal variations existed. Among the airborne microorganisms, Propionibacterium acnes, Escherichia coli, Acinetobacter lwoffii, Lactobacillus amylovorus, and Lactobacillus reuteri dominated, along with several viral species. We further identified an extensive repertoire of genes involved in antibiotic resistance and detoxification, including transporters, transpeptidases, and thioredoxins. Sample stratification based on Air Quality Index (AQI) demonstrated that many microbial species, including those associated with human, dog, and mouse feces, exhibit AQI-dependent incidence dynamics. The phylogenetic and functional diversity of air microbiome is comparable to those of soil and water environments, as its composition likely derives from a wide variety of sources.ConclusionsAirborne particulate matter accommodates rich and dynamic microbial communities, including a range of microbial elements that are associated with potential health consequences
- âŠ